In the first part of the course, we will go over some statistical preliminaries and corresponding computational aspects. We will learn:
Let us start with an occupancy study. Suppose we have a site that is being considered for development. There is a species of interest that might get affected by this development. Hence we need to study what proportion of the area is occupied by the species of interest. If this proportion is not very large, we may go ahead with the development.
Suppose we divide the site in several equal-area cells. Suppose all cells have similar habitats (identical). Further we assume that occupancy of one cell does not affect occupancy of other quadrats (independence). Let \(N\) be the total number of cells.
Let \(Y_i\) be the occupancy status of the i-th quadrat. This is unknown and hence is a random variable. It takes values in (0,1), 0 meaning unoccupied and 1 meaning occupied. This is a Bernoulli random variable. We denote this by \(Y_{i}\sim Bernoulli(\phi)\). The random variable \(Y\) takes value 1 with probability \(\phi\). This is the probability of occupancy. The value of \(\phi\) is unknown. This is the parameter of the distribution.
Suppose we visit \(n\), a subset, of these cells. These are selected using simple random sampling without replacement. The observations are denoted by \(y_1,y_2,y_3,...,y_n\). We can use these to infer about the unknown parameter \(\phi\). The main tool for such inductive inference (data to population and not hypothesis to prediction) is the likelihood function.
Suppose the observed data are (0,1,1,0,0). Then we can compute the probabiity of observing these data under various values of the parameter \(\phi\) (assuming independent, identically distributed Bernoulli random variables). It can be written as: \(L(\phi;y_{1},y_{2},...,y_{n})={\prod}P(y_{i};\phi)={\prod}\phi^{y_{i}}(1-\phi)^{1-y_{i}}\)
Notice that this is a function of the parameter \(\phi\) and the data are fixed. The likelihood function or equivalently the log-likelihood function quantifies the relative support for different values of the parameters. Hence only the likelihood ratio function is meaningful.
A natural approach to estimation (inference) of \(\phi\) is to choose the value that is better supported than any other value in the parameter space \((0,1)\). This is called the maximum likelihood estimator. We can show that this turns out to be: \(\hat{\phi}=\frac{1}{n}\sum y_{i}\) This is called an estimate. This is a fixed quantity because the data are observed and hence not random.
As scientists, would you stop at reporting this? I suspect not. If this estimate is large, say 0.85, the developer is going to say ‘you just got lucky (or, worse, you cheated) with your particular sample’. A natural question to ask then is ‘how different this estimate would have been if someone else had conducted the experiment?’. In this case, the ‘experiment to be repeated’ is fairly uncontroversial. We take another simple random sample without replacement from the study area. However, that is not always the case as we will see when we deal with the regression model.
The sampling distribution is the distribution of the estimates that one would have obtained had one conducted these replicate experiments. It is possible to get an approximation to this sampling distribution in a very general fashion if we use the method of maximum likelihood estimator. In many situations, it can be shown that the sampling distribution is \(\hat{\phi}\sim N(\phi,\frac{1}{n}I^{-1}(\phi))\) where \(I(\phi)=-\frac{1}{n}\sum\frac{d^2}{d^2\phi}logL(\phi;{y})\)
This is also called the Hessian matrix or the curvature matrix of the log-likelihood function. The higher the curvature, the less variable are the estimates from one experiment to other. Hence the resultant ‘estimate’ is considered highly reliable.
This is just a set of values that covers the estimates from 95% of the experiments. The experiments are not actually replicated and hence this simply tells us what the various outcomes could be. Our decisions could be based on this variation as long as we all agree on the experiment that could be replicated. We are simply covering our bases against the various outcomes and protect ourselves from future challenges. If we use the maximum likelihood estimator, we can obtain this as: \(\hat{\phi}-\frac{1.96}{n}\sqrt{I^{-1}(\hat{\phi})},\hat{\phi}+\frac{1.96}{n}\sqrt{I^{-1}(\hat{\phi})}\) You will notice that as we increase the sample size, the width of this interval converges to zero. That is, as we increase the sample size, the MLE converges to the true parameter value. This is called the ‘consistency’ of an estimator. This is an essential property of any statistical inferential procedure.
All the above statements seem logical but fake at the same time! No one repeats the same experiment (although replication consistency is an essential scientific requirement). What if we have time series? We can never replicate a time series. So then should we simply take the estimated value prima facie? That also seems incorrect scientifically. So where is the uncertainty in our mind coming from? According to the Bayesian paradigm, it arises because of our ‘personal’ uncertainty about the parameter values.
Prior distribution: Suppose we have some idea about what values of occupancy are more likely than others before any data are collected. This can be quantified as a probability distribution on the parameter space (0,1). This distribution can be anything, unimodal or bimodal or even multimodal! Let us denote this by \(\pi(\phi)\). How do we change this after we observe the data?
Posterior distribution This is the quantification of uncertainty after we observe the data. Usually observing the data decreases our uncertainty, although it is not guaranteed to be the case. The posterior distribution is obtained by: \(\pi(\phi|y)=\frac{L(\phi;y)\pi(\phi)}{\int L(\phi;d\phi y)\pi(\phi)}\)
This is obtained by using the percentiles of the posterior distribution.
Notice a few things here:
An interesting result follows. As we increase the samples size, the Bayesian posterior, for ANY prior, converges to the distribution that looks very much like the frequentist sampling distribution of the MLE. That is, \(\pi(\phi|y)\thickapprox N(\hat{\phi},\frac{1}{n}I^{-1}(\hat{\phi}))\) There are subtle differences that we are going to ignore here. Qualitatively, what this says is that for large sample size:
Hence credible interval and confidence intervals will be indistinguishable for large sample size. Effect of the choice of the prior distribution vanishes. How large a sample size should be for this to happen? It depends on the number of parameters in the model and how strong the prior distribution is.
Bayesian and ML inference using MCMC and data cloning
We now show how one can compute the posterior distribution for any choice of the prior distribution without analytically calculating the integral in the denominator. We will generate the data under the Bernoulli model. You can change the parameters as you wish when you run the code.
Simulate a simple data set with 30 observations:
library(dclone)
## Loading required package: coda
## Loading required package: parallel
## Loading required package: Matrix
## dclone 2.3-1 2023-04-07
phi.true = 0.3
n = 30
Y = rbinom(n,1,phi.true)
table(Y)
## Y
## 0 1
## 20 10
Analytical MLE:
(MLE.est = sum(Y)/n)
## [1] 0.3333333
We will use the JAGS program and the dclone R package.
First, we need to define the model function. This is the critical component.
Occ.model = function(){
# Likelihood
for (i in 1:n){
Y[i] ~ dbin(phi_occ, 1)
}
# Prior
phi_occ ~ dbeta(1, 1)
}
Second, we need to provide the data to the model and generate random numbers from the posterior. We will discuss different options later.
dat = list(Y=Y, n=n)
Occ.Bayes = jags.fit(data=dat, params="phi_occ", model=Occ.model)
## Registered S3 method overwritten by 'R2WinBUGS':
## method from
## as.mcmc.list.bugs dclone
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 1
## Total graph size: 33
##
## Initializing model
summary(Occ.Bayes)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.3426129 0.0811765 0.0006628 0.0008021
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.1956 0.2840 0.3394 0.3981 0.5087
plot(Occ.Bayes)
The summary describes the posterior distribution: its mean, standard deviation, and quantiles.
This was quite easy. Now we use data cloning to compute the MLE and its variance using MCMC.
As you all know, at least in this simple situation, we can write down
the likelihood function analytically. We can also use calculus and/or
numerical optimization such as the optim() function in R to
get the location of the maximum and its Hessian matrix. But suppose we
do not want to go through all of that and instead want to use the MCMC
algorithm. Why? Because it is easy and can be generalized to more
complex hierarchical models.
Earlier we noted that as we increase the sample size, the Bayesian posterior converges to the sampling distribution of the MLE. We, obviously, cannot increase the sample size. The data are given to us. Data cloning conducts a computational trick to increase the sample size. We clone the data!
Imagine a sequence of K independent researchers.
What is happening with these sequential posterior distributions?
The posterior distribution is converging to a single point; a degenerate distribution. This is identical to the MLE!
You can play with the number of clones and see the effect on the posterior distribution using this Shiny app (R code for the app)
We do not need to implement this procedure sequentially. The matrix of these K datasets is of dimension (n,K) with identical columns.
\(\left[\begin{array}{cccccccccc} y_{1} & y_{1} & y_{1} & y_{1} & y_{1} & y_{1} & y_{1} & y_{1} & y_{1} & y_{1}\\ y_{2} & y_{2} & y_{2} & y_{2} & y_{2} & y_{2} & y_{2} & y_{2} & y_{2} & y_{2}\\ y_{3} & y_{3} & y_{3} & y_{3} & y_{3} & y_{3} & y_{3} & y_{3} & y_{3} & y_{3}\\ y_{4} & y_{4} & y_{4} & y_{4} & y_{4} & y_{4} & y_{4} & y_{4} & y_{4} & y_{4}\\ y_{5} & y_{5} & y_{5} & y_{5} & y_{5} & y_{5} & y_{5} & y_{5} & y_{5} & y_{5} \end{array}\right]\)
We use the Bayesian procedure to analyze these data. The model function used previously can be used with a minor modification to do this.
Occ.model.dc = function(){
# Likelihood
for(k in 1:ncl){
for (i in 1:n){
Y[i,k] ~ dbin(phi_occ, 1)
}
}
# Prior
phi_occ ~ dbeta(1, 1)
}
To match this change in the model, we need to turn the original data into an array.
Y = array(Y, dim=c(n, 1))
Y = dcdim(Y)
When defining the data, we need to add another index ncl
for the cloned dimension. It gets multiplied by the number of
clones.
dat = list(Y=Y, n=n, ncl=1)
# 2 clones of the Y array
dclone(Y, 2)
## clone.1 clone.2
## [1,] 1 1
## [2,] 0 0
## [3,] 0 0
## [4,] 0 0
## [5,] 1 1
## [6,] 0 0
## [7,] 1 1
## [8,] 1 1
## [9,] 0 0
## [10,] 0 0
## [11,] 0 0
## [12,] 0 0
## [13,] 0 0
## [14,] 0 0
## [15,] 0 0
## [16,] 1 1
## [17,] 0 0
## [18,] 1 1
## [19,] 1 1
## [20,] 0 0
## [21,] 1 1
## [22,] 1 1
## [23,] 0 0
## [24,] 0 0
## [25,] 0 0
## [26,] 0 0
## [27,] 0 0
## [28,] 0 0
## [29,] 0 0
## [30,] 1 1
## attr(,"n.clones")
## [1] 2
## attr(,"n.clones")attr(,"method")
## [1] "dim"
## attr(,"n.clones")attr(,"method")attr(,"drop")
## [1] TRUE
# 2 clones of the data list - this is not what we want
dclone(dat, 2)
## $Y
## clone.1 clone.2
## [1,] 1 1
## [2,] 0 0
## [3,] 0 0
## [4,] 0 0
## [5,] 1 1
## [6,] 0 0
## [7,] 1 1
## [8,] 1 1
## [9,] 0 0
## [10,] 0 0
## [11,] 0 0
## [12,] 0 0
## [13,] 0 0
## [14,] 0 0
## [15,] 0 0
## [16,] 1 1
## [17,] 0 0
## [18,] 1 1
## [19,] 1 1
## [20,] 0 0
## [21,] 1 1
## [22,] 1 1
## [23,] 0 0
## [24,] 0 0
## [25,] 0 0
## [26,] 0 0
## [27,] 0 0
## [28,] 0 0
## [29,] 0 0
## [30,] 1 1
## attr(,"n.clones")
## [1] 2
## attr(,"n.clones")attr(,"method")
## [1] "dim"
## attr(,"n.clones")attr(,"method")attr(,"drop")
## [1] TRUE
##
## $n
## [1] 30 30
## attr(,"n.clones")
## [1] 2
## attr(,"n.clones")attr(,"method")
## [1] "rep"
##
## $ncl
## [1] 1 1
## attr(,"n.clones")
## [1] 2
## attr(,"n.clones")attr(,"method")
## [1] "rep"
Notice that this changes n also. We do not want that, we
want to keep n unchanged.
dclone(dat, 2, unchanged="n", multiply="ncl")
## $Y
## clone.1 clone.2
## [1,] 1 1
## [2,] 0 0
## [3,] 0 0
## [4,] 0 0
## [5,] 1 1
## [6,] 0 0
## [7,] 1 1
## [8,] 1 1
## [9,] 0 0
## [10,] 0 0
## [11,] 0 0
## [12,] 0 0
## [13,] 0 0
## [14,] 0 0
## [15,] 0 0
## [16,] 1 1
## [17,] 0 0
## [18,] 1 1
## [19,] 1 1
## [20,] 0 0
## [21,] 1 1
## [22,] 1 1
## [23,] 0 0
## [24,] 0 0
## [25,] 0 0
## [26,] 0 0
## [27,] 0 0
## [28,] 0 0
## [29,] 0 0
## [30,] 1 1
## attr(,"n.clones")
## [1] 2
## attr(,"n.clones")attr(,"method")
## [1] "dim"
## attr(,"n.clones")attr(,"method")attr(,"drop")
## [1] TRUE
##
## $n
## [1] 30
##
## $ncl
## [1] 2
## attr(,"n.clones")
## [1] 2
## attr(,"n.clones")attr(,"method")
## [1] "multi"
The dc.fit function takes the familiar arguments to
determine how to clone the data list.
Occ.DC = dc.fit(data=dat, params="phi_occ", model=Occ.model.dc,
n.clones=c(1, 2, 5), unchanged="n", multiply="ncl")
##
## Fitting model with 1 clone
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 1
## Total graph size: 34
##
## Initializing model
##
##
## Fitting model with 2 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 60
## Unobserved stochastic nodes: 1
## Total graph size: 64
##
## Initializing model
##
##
## Fitting model with 5 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 150
## Unobserved stochastic nodes: 1
## Total graph size: 154
##
## Initializing model
summary(Occ.DC)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
## Number of clones = 5
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD DC SD Naive SE Time-series SE R hat
## phi_occ 0.3352 0.03816 0.08532 0.0003116 0.0003994 1
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.2623 0.3096 0.3345 0.3607 0.4117
plot(Occ.DC)
Notice the Mean, DC SD, and
R hat columns in the summary. These refer to the maximum
likelihood estimate, the asymptotic standatd error (SD of the posterior
times square root of K), and the Gelman-Rubin disgnostic:
coef(Occ.DC) # MLE
## phi_occ
## 0.3351961
dcsd(Occ.DC) # SE=SD*sqrt(K)
## phi_occ
## 0.08532459
## attr(,"method")
## Y n ncl
## "dim" NA "multi"
gelman.diag(Occ.DC) # R hat
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## phi_occ 1 1
Summaries of the posterior distributions for the different numbers of
clones are saved and we can print these out with the
dctable() command. We can visualize these with the
plot function.
dctable(Occ.DC)
## $phi_occ
## n.clones mean sd 2.5% 25% 50% 75%
## 1 1 0.3449931 0.08320118 0.1931794 0.2864487 0.3411990 0.3997247
## 2 2 0.3380204 0.06024657 0.2253682 0.2955537 0.3363799 0.3782251
## 3 5 0.3351961 0.03815832 0.2622565 0.3095898 0.3344979 0.3606665
## 97.5% r.hat
## 1 0.5179685 1.0002029
## 2 0.4595045 0.9999392
## 3 0.4116999 1.0004954
##
## attr(,"class")
## [1] "dctable"
dctable(Occ.DC) |> plot()
Some data cloning related diagnostics are printed with the
dcdiag() function. We will discuss these statistics in
detail. The most important thing to ckeck is that the solid line follows
the scattered line for lambda.max, i.e. decreases with the
number of clones.
dcdiag(Occ.DC)
## n.clones lambda.max ms.error r.squared r.hat
## 1 1 0.006922436 0.0026932905 0.0006577752 1.0002029
## 2 2 0.003629650 0.0057757427 0.0016336320 0.9999392
## 3 5 0.001456057 0.0006324413 0.0002696127 1.0004954
dcdiag(Occ.DC) |> plot()
Now we will generalize these models to account for covariates. We will consider Logistic regression but also comment on how to change it to Probit regression easily. Similarly we show how this basic prototype can be modified to do linear and non-linear regression, Poisson regression etc.
n = 30 # sample size
X1 = rnorm(n) # a covariate
X = model.matrix(~X1)
beta.true = c(0.5, 1)
link_mu = X %*% beta.true # logit scale
phi_occ = plogis(link_mu) # prob scale
Y = rbinom(n, 1, phi_occ)
Maximum likelihood estimate using glm():
MLE.est = glm(Y ~ X1, family="binomial")
Bayesian analysis
Occ.model = function(){
# Likelihood
for (i in 1:n){
phi_occ[i] <- ilogit(X[i,] %*% beta)
Y[i] ~ dbin(phi_occ[i], 1)
}
# Prior
beta[1] ~ dnorm(0, 1)
beta[2] ~ dnorm(0, 1)
}
Now we need to provide the data to the model and generate random numbers from the posterior. We will discuss different options later.
dat = list(Y=Y, X=X, n=n)
Occ.Bayes = jags.fit(data=dat, params="beta", model=Occ.model)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 2
## Total graph size: 186
##
## Initializing model
summary(Occ.Bayes)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] -0.1373 0.3839 0.003135 0.004150
## beta[2] 1.1414 0.4994 0.004077 0.005254
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -0.9026 -0.3918 -0.136 0.1189 0.6079
## beta[2] 0.2114 0.7971 1.119 1.4632 2.1778
plot(Occ.Bayes)
pairs(Occ.Bayes)
Now we modify this to get the MLE using data cloning.
Occ.model_dc = function(){
# Likelihood
for (k in 1:ncl){
for (i in 1:n){
phi_occ[i,k] <- ilogit(X[i,,k] %*% beta)
Y[i,k] ~ dbin(phi_occ[i,k],1)
}
}
# Prior
beta[1] ~ dnorm(0, 1)
beta[2] ~ dnorm(0, 1)
}
Now we need to provide the data to the model and generate random numbers from the posterior. We will discuss different options later.
Y = array(Y, dim=c(n, 1))
X = array(X, dim=c(dim(X), 1))
# clone the objects
Y = dcdim(Y)
X = dcdim(X)
Data cloning with dc.fit():
dat = list(Y=Y, X=X, n=n, ncl=1)
Occ.DC = dc.fit(data=dat, params="beta", model=Occ.model_dc,
n.clones=c(1, 2, 5), unchanged="n", multiply="ncl")
##
## Fitting model with 1 clone
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 2
## Total graph size: 187
##
## Initializing model
##
##
## Fitting model with 2 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 60
## Unobserved stochastic nodes: 2
## Total graph size: 307
##
## Initializing model
##
##
## Fitting model with 5 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 150
## Unobserved stochastic nodes: 2
## Total graph size: 667
##
## Initializing model
summary(Occ.DC)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
## Number of clones = 5
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD DC SD Naive SE Time-series SE R hat
## beta[1] -0.155 0.1777 0.3975 0.001451 0.001743 1
## beta[2] 1.350 0.2579 0.5767 0.002106 0.002691 1
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -0.5010 -0.2764 -0.1559 -0.03285 0.1901
## beta[2] 0.8591 1.1735 1.3451 1.52043 1.8615
plot(Occ.DC)
pairs(Occ.DC)
These are the familiar functions for the asymptotic ML inference (already accounted for the number of clones):
coef(Occ.DC) # MLE
## beta[1] beta[2]
## -0.1550155 1.3496525
dcsd(Occ.DC) # SE
## beta[1] beta[2]
## 0.397459 0.576738
vcov(Occ.DC) # asymptotic VCV
## beta[1] beta[2]
## beta[1] 0.157973685 -0.006228316
## beta[2] -0.006228316 0.332626678
confint(Occ.DC, level=0.95) # 95% CI
## 2.5 % 97.5 %
## beta[1] -0.9340209 0.6239899
## beta[2] 0.2192669 2.4800381
Let’s check the posterior summaries and the DC diagnostics:
dctable(Occ.DC)
## $`beta[1]`
## n.clones mean sd 2.5% 25% 50% 75%
## 1 1 -0.1280467 0.3745953 -0.8691111 -0.3800893 -0.1219638 0.12364566
## 2 2 -0.1484927 0.2789263 -0.6948277 -0.3341926 -0.1478030 0.03680839
## 3 5 -0.1550155 0.1777491 -0.5010395 -0.2764246 -0.1558643 -0.03285093
## 97.5% r.hat
## 1 0.5952610 1.000780
## 2 0.4013616 1.000626
## 3 0.1900681 1.000130
##
## $`beta[2]`
## n.clones mean sd 2.5% 25% 50% 75% 97.5%
## 1 1 1.134465 0.4867618 0.2279742 0.8030493 1.116657 1.449563 2.130883
## 2 2 1.261849 0.3865462 0.5281616 0.9970322 1.252060 1.514078 2.040573
## 3 5 1.349652 0.2579251 0.8591293 1.1735045 1.345107 1.520426 1.861508
## r.hat
## 1 1.001643
## 2 1.000037
## 3 1.000474
##
## attr(,"class")
## [1] "dctable"
dctable(Occ.DC) |> plot()
dcdiag(Occ.DC)
## n.clones lambda.max ms.error r.squared r.hat
## 1 1 0.2371100 0.015734348 0.002901883 1.001379
## 2 2 0.1494787 0.007890596 0.001157451 1.000639
## 3 5 0.0665697 0.006034254 0.001506192 1.000306
dcdiag(Occ.DC) |> plot()
We hope you can see the pattern in how we are changing the prototype model function and the data function. If we want to do a Normal linear regression and Poisson regression we can modify the regression program above easily.
The following section issustrates Gaussian linear regression.
n = 30
X1 = rnorm(n)
X = model.matrix(~X1)
beta.true = c(0.5, 1)
link_mu = X %*% beta.true
# Linear regression model
mu = link_mu
sigma.e = 1
Y = rnorm(n,mean=mu,sd=sigma.e)
# MLE
MLE.est = glm(Y ~ X1, family="gaussian")
# Bayesian analysis
Normal.model = function(){
# Likelihood
for (i in 1:n){
mu[i] <- X[i,] %*% beta
Y[i] ~ dnorm(mu[i],prec.e)
}
# Prior
beta[1] ~ dnorm(0, 1)
beta[2] ~ dnorm(0, 1)
prec.e ~ dlnorm(0, 1)
}
dat = list(Y=Y, X=X, n=n)
Normal.Bayes = jags.fit(data=dat, params=c("beta","prec.e"), model=Normal.model)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 3
## Total graph size: 157
##
## Initializing model
summary(Normal.Bayes)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] 0.5172 0.1716 0.001401 0.001388
## beta[2] 0.8064 0.2355 0.001923 0.001923
## prec.e 1.1683 0.3028 0.002472 0.003728
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] 0.1771 0.4061 0.5161 0.6321 0.8524
## beta[2] 0.3429 0.6514 0.8089 0.9609 1.2695
## prec.e 0.6612 0.9547 1.1394 1.3556 1.8460
plot(Normal.Bayes)
pairs(Normal.Bayes)
# MLE using data cloning.
Normal.model_dc = function(){
# Likelihood
for (k in 1:ncl){
for (i in 1:n){
mu[i,k] <- X[i,,k] %*% beta
Y[i,k] ~ dnorm(mu[i,k],prec.e)
}
}
# Prior
beta[1] ~ dnorm(0, 1)
beta[2] ~ dnorm(0, 1)
prec.e ~ dlnorm(0, 1)
}
Y = array(Y, dim=c(n, 1))
X = array(X, dim=c(dim(X), 1))
Y = dcdim(Y)
X = dcdim(X)
dat = list(Y=Y, X=X, n=n, ncl=1)
Normal.DC = dc.fit(data=dat, params=c("beta","prec.e"), model=Normal.model_dc,
n.clones=c(1, 2, 5), unchanged="n", multiply="ncl")
##
## Fitting model with 1 clone
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 3
## Total graph size: 158
##
## Initializing model
##
##
## Fitting model with 2 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 60
## Unobserved stochastic nodes: 3
## Total graph size: 278
##
## Initializing model
##
##
## Fitting model with 5 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 150
## Unobserved stochastic nodes: 3
## Total graph size: 638
##
## Initializing model
summary(Normal.DC)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
## Number of clones = 5
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD DC SD Naive SE Time-series SE R hat
## beta[1] 0.5277 0.07414 0.1658 0.0006054 0.0006167 1
## beta[2] 0.8511 0.10385 0.2322 0.0008479 0.0008432 1
## prec.e 1.2412 0.14237 0.3183 0.0011624 0.0015376 1
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] 0.3838 0.4774 0.5278 0.5770 0.6736
## beta[2] 0.6474 0.7814 0.8511 0.9203 1.0567
## prec.e 0.9766 1.1418 1.2362 1.3346 1.5352
plot(Normal.DC)
pairs(Normal.DC)
We will now modify the code to conduct count data regression using the Poisson distribution and log-link.
n = 30
X1 = rnorm(n)
X = model.matrix(~X1)
beta.true = c(0.5, 1)
link_mu = X %*% beta.true
# Log-linear regression model
mu = exp(link_mu)
Y = rpois(n, mu)
# MLE
MLE.est = glm(Y ~ X1, family="poisson")
# Bayesian analysis
Poisson.model = function(){
# Likelihood
for (i in 1:n){
mu[i] <- exp(X[i,] %*% beta)
Y[i] ~ dpois(mu[i])
}
# Prior
beta[1] ~ dnorm(0, 1)
beta[2] ~ dnorm(0, 1)
}
dat = list(Y=Y, X=X, n=n)
Poisson.Bayes = jags.fit(data=dat, params="beta", model=Poisson.model)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 2
## Total graph size: 186
##
## Initializing model
summary(Poisson.Bayes)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] 0.6106 0.1466 0.001197 0.002083
## beta[2] 1.0124 0.1316 0.001075 0.001858
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] 0.3169 0.5117 0.6158 0.7116 0.8876
## beta[2] 0.7621 0.9224 1.0100 1.1005 1.2759
plot(Poisson.Bayes)
pairs(Poisson.Bayes)
# MLE using data cloning
Poisson.model_dc = function(){
# Likelihood
for (k in 1:ncl){
for (i in 1:n){
mu[i,k] <- exp(X[i,,k] %*% beta)
Y[i,k] ~ dpois(mu[i,k])
}
}
# Prior
beta[1] ~ dnorm(0, 1)
beta[2] ~ dnorm(0, 1)
}
Y = array(Y, dim=c(n, 1))
X = array(X, dim=c(dim(X), 1))
Y = dcdim(Y)
X = dcdim(X)
dat = list(Y=Y, X=X, n=n, ncl=1)
Poisson.DC = dc.fit(data=dat, params="beta", model=Poisson.model_dc,
n.clones=c(1, 2, 5), unchanged="n", multiply="ncl")
##
## Fitting model with 1 clone
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 30
## Unobserved stochastic nodes: 2
## Total graph size: 187
##
## Initializing model
##
##
## Fitting model with 2 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 60
## Unobserved stochastic nodes: 2
## Total graph size: 307
##
## Initializing model
##
##
## Fitting model with 5 clones
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 150
## Unobserved stochastic nodes: 2
## Total graph size: 667
##
## Initializing model
summary(Poisson.DC)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
## Number of clones = 5
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD DC SD Naive SE Time-series SE R hat
## beta[1] 0.6274 0.06589 0.1473 0.0005380 0.0009094 1
## beta[2] 1.0168 0.05954 0.1331 0.0004861 0.0008398 1
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] 0.4970 0.5837 0.6278 0.6729 0.7531
## beta[2] 0.9006 0.9762 1.0163 1.0573 1.1337
plot(Poisson.DC)
pairs(Poisson.DC)